{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Hackathon #4\n",
    "\n",
    "Written by Eleanor Quint\n",
    "\n",
    "Topics:\n",
    "- Convolutional and pooling layers\n",
    "- Tensor-in Tensor-out programming style\n",
    "\n",
    "This is all setup in a IPython notebook so you can run any code you want to experiment with. Feel free to edit any cell, or add some to run your own code."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# We'll start with our library imports...\n",
    "from __future__ import print_function\n",
    "\n",
    "import os  # to work with file paths\n",
    "\n",
    "import tensorflow as tf         # to specify and run computation graphs\n",
    "import numpy as np              # for numerical operations taking place outside of the TF graph\n",
    "import matplotlib.pyplot as plt # to draw plots\n",
    "\n",
    "cifar_dir = '/work/cse496dl/shared/hackathon/04/cifar/'"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll use [CIFAR-10](https://www.cs.toronto.edu/~kriz/cifar.html) this time, an image recognition dataset, and we'll reshape the datasets into rank 4 Tensors for use with convolutions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# load CIFAR-10\n",
    "train_images = np.load(os.path.join(cifar_dir, 'cifar10_train_data.npy'))\n",
    "train_images = np.reshape(train_images, [-1, 32, 32, 3]) # `-1` means \"everything not otherwise accounted for\"\n",
    "train_labels = np.load(os.path.join(cifar_dir, 'cifar10_train_labels.npy'))\n",
    "\n",
    "test_images = np.load(os.path.join(cifar_dir, 'cifar10_test_data.npy'))\n",
    "test_images = np.reshape(test_images, [-1, 32, 32, 3]) # `-1` means \"everything not otherwise accounted for\"\n",
    "test_labels = np.load(os.path.join(cifar_dir, 'cifar10_test_labels.npy'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Convolutional Layers\n",
    "\n",
    "TensorFlow implements the convolutional layer with [tf.layers.conv2d](https://www.tensorflow.org/api_docs/python/tf/layers/conv2d). The function that instantiates the layer has three required arguments: input, number of filters, and filter size.\n",
    "\n",
    "Important to keep in mind is\n",
    "\n",
    "1. Input data should be 4-dimensional with shape (batch, height, width, channels), unless the `data_format` argument is specified\n",
    "2. `padding` is 'valid' by default, meaning that only filters which lie fully within the input image will be kept. This will make the resulting image slightly smaller than the input. Use `padding='same'` if image size should be preserved. [TensorFlow padding details](https://www.tensorflow.org/api_guides/python/nn#Notes_on_SAME_Convolution_Padding)\n",
    "3. Specifying `strides=(n,n)` for some `n > 1` will result in an output image multiplicatively smaller than the input by a factor of `n`. Make sure that `n` isn't greater than the filter size unless you intend to completely ignore part of the input."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# A simple conv network\n",
    "# Clear the graph\n",
    "tf.reset_default_graph()\n",
    "\n",
    "# note that our placeholder is 4 dimensional now\n",
    "x = tf.placeholder(tf.float32, [None, 32, 32, 3], name='input_placeholder')\n",
    "# let's specify a small conv stack\n",
    "hidden_1 = tf.layers.conv2d(inputs=x, filters=32, kernel_size=5, padding='same', activation=tf.nn.relu, name='hidden_1')\n",
    "hidden_2 = tf.layers.conv2d(inputs=hidden_1, filters=64, kernel_size=5, padding='same', activation=tf.nn.relu, name='hidden_2')\n",
    "# followed by a dense layer output\n",
    "flat = tf.reshape(hidden_2, [-1, 32*32*64]) # flatten from 4D to 2D for dense layer\n",
    "output = tf.layers.dense(flat, 10, name='output')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This architecture of a convolutional stack followed by dense layers for classification is pretty typical. It has the major advantages of making transformations to the data that respect the spatial layout of the data, and using many fewer parameters.\n",
    "\n",
    "The number of parameters in each convolution layer can be calculated as `filter_height * filter_width * in_channels * output_channels`, as opposed to dense layers which have `input_size * output_size` parameters. For example, if we're working CIFAR images, a first layer 5x5 convolution with 32 filters will have `5 * 5 * 3 * 32 = 2400` parameters, compared to a dense layer with 32 neurons' `(32 * 32 * 3) * 32 = 98304` parameters. This is a factor of ~40 decrease, significantly smaller!\n",
    "\n",
    "The convolutional network above has ~700 thousand parameters:\n",
    "\n",
    "| Names     | Type    | Output Shape | Parameters  |\n",
    "| --------- | ------- |:------------:| -----------:|\n",
    "| x         | PH      | [32,32,3]    |             |\n",
    "| hidden_1  | conv    | [32,32,32]   |        2400 |\n",
    "| hidden_2  | conv    | [32,32,64]   |       51200 |\n",
    "| flat      | reshape | [65536]      |             |\n",
    "| output    | dense   | [10]         |      655360 |\n",
    "| **total** |         |              |      708960 |\n",
    "\n",
    "The equivalent of the the fully connected networks we specified in hackathons 2 & 3 would have ~600 thousand parameters:\n",
    "\n",
    "| Names     | Type    | Output Shape | Parameters  |\n",
    "| --------- | ------- |:------------:| -----------:|\n",
    "| x         | PH      | [3072]       |             |\n",
    "| hidden_1  | dense   | [200]        |      614400 |\n",
    "| output    | dense   | [10]         |        2000 |\n",
    "| **total** |         |              |      616400 |\n",
    "\n",
    "We've managed to increase the number of parameters! This is because the tensor output by each conv is of size `[image_height, image_width, # filters]`, which can get very large. Generally, early in the network we start with fewer filters and increase the number as we progress through the network. This is partly because, as we increase the number of filters, we can make the image size smaller. Let's do this by introducing pooling to the network.\n",
    "\n",
    "### Pooling Layers\n",
    "\n",
    "Pooling shrinks the image by applying a reducing function to small spatial patches of the image. The most popular pooling operation is applying a `max`, implemented by [tf.layers.max_pooling2d\n",
    "](https://www.tensorflow.org/api_docs/python/tf/layers/max_pooling2d). It requires the 4D tensor input, the pool size, and pool stride. Generally, these last two arguments should be the same and small, e.g., 2 or 3. Let's see the effect of pooling on tensor sizes and parameter numbers. Just like for a convolution, `padding` defaults to 'valid'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# A simple conv network with pooling\n",
    "# Clear the graph\n",
    "tf.reset_default_graph()\n",
    "\n",
    "x = tf.placeholder(tf.float32, [None, 32, 32, 3], name='data_placeholder')\n",
    "# let's specify a conv stack\n",
    "hidden_1 = tf.layers.conv2d(x, 32, 5, padding='same', activation=tf.nn.relu, name='hidden_1')\n",
    "pool_1 = tf.layers.max_pooling2d(hidden_1, 2, 2, padding='same')\n",
    "hidden_2 = tf.layers.conv2d(pool_1, 64, 5, padding='same', activation=tf.nn.relu, name='hidden_2')\n",
    "pool_2 = tf.layers.max_pooling2d(hidden_2, 2, 2, padding='same')\n",
    "# followed by a dense layer output\n",
    "flat = tf.reshape(hidden_2, [-1, 8*8*64]) # flatten from 4D to 2D for dense layer\n",
    "output = tf.layers.dense(flat, 10, name='output')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This new network with pooling shrinks the tensor size as it progresses and ends up with ~100 thousand parameters, a factor of ~6 difference!\n",
    "\n",
    "| Names     | Type    | Output Shape | Parameters  |\n",
    "| --------- | ------- |:------------:| -----------:|\n",
    "| x         | PH      | [32,32,3]    |             |\n",
    "| hidden_1  | conv    | [32,32,32]   |        2400 |\n",
    "| pool_1    | max_pool| [16,16,32]   |             |\n",
    "| hidden_2  | conv    | [16,16,64]   |       51200 |\n",
    "| pool_2    | max_pool| [8,8,64]     |             |\n",
    "| flat      | reshape | [4096]       |             |\n",
    "| output    | dense   | [10]         |       40960 |\n",
    "| **total** |         |              |       94560 |\n",
    "\n",
    "Now, let's take an aside to [visualize a CNN](http://scs.ryerson.ca/~aharley/vis/conv/flat.html). Additionally, there are many types of [convolution](https://www.tensorflow.org/api_guides/python/nn#Convolution) and [pooling](https://www.tensorflow.org/api_guides/python/nn#Pooling) available in TensorFlow."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Tensor-in Tensor-out Programming\n",
    "\n",
    "Up to this point, we've been using the lower-case versions of each layer from [tf.layers](https://www.tensorflow.org/api_docs/python/tf/layers). These expect their input at construction time and return handles to their output `Tensor`s. They naturally lead to a very procedural type of code for specifying networks and restricts us from re-using layers. Luckily, tf.layers also offers uppercase versions of each layer which construct the layer without needing the input and, instead of returning a handle to the `Tensor`, return an \"unbuilt\" layer object which may later be called with inputs (like ordinary Python functions) and only they return the output `Tensor`. This allows re-use of layers and a more object-oriented style of programming with easier composition of layers and blocks (sequential stacks of layers and other blocks).\n",
    "\n",
    "The function below specifies a network without requiring inputs. If we try to do anything (like count the number of parameters) with the unbuilt layers, we'll get an error:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "tf.reset_default_graph()\n",
    "def my_conv_block(inputs, filters):\n",
    "    \"\"\"\n",
    "    Args:\n",
    "        - inputs: 4D tensor of shape NHWC\n",
    "        - filters: iterable of ints of length 3\n",
    "    \"\"\"\n",
    "    with tf.name_scope('conv_block') as scope:\n",
    "        first_conv = tf.layers.Conv2D(filters[0], 3, 1, padding='same')\n",
    "        second_conv = tf.layers.Conv2D(filters[1], 3, 1, padding='same')\n",
    "        third_conv = tf.layers.Conv2D(filters[2], 3, 1, padding='same')\n",
    "        pool = tf.layers.MaxPooling2D(2, 2, padding='same')\n",
    "        output_tensor = pool(third_conv(second_conv(first_conv(inputs))))\n",
    "        \n",
    "        layer_list = [first_conv, second_conv, third_conv, pool]\n",
    "        block_parameter_num = sum(map(lambda layer: layer.count_params(), layer_list))\n",
    "        print('Number of parameters in conv block with {} input: {}'.format(inputs.shape, block_parameter_num))\n",
    "        return output_tensor\n",
    "    \n",
    "x = tf.placeholder(shape=[None, 32, 32, 3], dtype=tf.float32)\n",
    "conv_x = my_conv_block(x, [16, 32, 64])\n",
    "print('shape output from conv block: {}'.format(conv_x.shape))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Hackathon 4 Exercise 1\n",
    "\n",
    "Specify two functions that build conv blocks, as above, but one should use regular convolutions and the other should use [separable convolutions](tf.layers.SeparableConv2D). Make sure that each function accepts arguments for the number of filters, type of activations, and regularization. Then, construct a small convolutional network that classifies CIFAR10 and count the number of parameters in it.\n",
    "\n",
    "You might find that this style of code is useful in homework to more easily handle and construct layers, blocks, and networks. Remember to submit trained parameters to Handin ASAP to get into the running for extra credit!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# NUMBER OF PARAMETERS: <YOUR NUMBER HERE>\n",
    "\n",
    "# Your code here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Coda\n",
    "\n",
    "### Convolutional Filters from the First Layer of ImageNet\n",
    "\n",
    "Interestingly, automatically learned filters often closely resemble [Gabor Filters](https://en.wikipedia.org/wiki/Gabor_filter). The first layer of the original ImageNet network learned the following filters:\n",
    "\n",
    "![](http://smerity.com/media/images/articles/2016/imagenet_conv_kernels.png \"Convolutional Filters from the First Layer of ImageNet\")\n",
    "\n",
    "### [Visualizing Convolutional Features](https://distill.pub/2017/feature-visualization/)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "TensorFlow 1.12 (py36)",
   "language": "python",
   "name": "tensorflow-1.12-py36"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
